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ABSTRACT 

We introduce a new set of large N-body runs, the MICE simulations, that provide 
a unique combination of very large cosmological volumes with good mass resolution. 
They follow the gravitational evolution of ~ 8.5 billion particles (2048 3 ) in volumes 
covering up to ~ 15 Hubble volumes (i.e., 450 h~ 3 Gpc 3 ), and sample over 5 decades 
in spatial resolution. Our main goal is to accurately model and calibrate basic cosmo- 
logical probes that will be used by upcoming astronomical surveys of unprecedented 
volume. Here we take advantage of the very large volumes of MICE to make a ro- 
bust sampling of the high- mass tail of the halo mass function (MF). We discuss and 
avoid possible systematic effects in our study, and do a detailed analysis of different 
error estimators. We find that available fits to the local abundance of halos (War- 
ren et al. (2006)) match well the abundance estimated in the large volume of MICE 
up to M ~ 10 1 ft^'Mg, but significantly deviate for larger masses, underestimat- 
ing the mass function by 10% (30%) at M = 3.16 x IO^/i^Mq (10 15 h' 1 M Q ). 
Similarly, the widely used Sheth & Tormen (1999) fit, if extrapolated to high red- 
shift assuming universality, leads to an underestimation of the cluster abundance 
by 30%, 20% and 15% at z = 0, 0.5, 1 for fixed v — S c /a « 3 (corresponding to 
M ~ [7-2.5-0.8] x 10 14 h' 1 M respectively). We provide a re-calibration of the MF 
over 5 orders of magnitude in mass (10 10 < Mj ( h~ x M Q ) < 10 15 ), that accurately de- 
scribes its redshift evolution up to z = 1. We explore the impact of this re-calibration 
on the determination of the dark-energy equation of state w, and conclude that using 
available fits that assume universal evolution for the cluster MF may systematically 
bias the estimate of w by as much as 50% for medium-depth (z £ 1) surveys. The 
halo catalogu es used in t his analysis are publicly available at the MICE webpage, 



http://www.ice.cat/mice 



1 INTRODUCTION 

Near future extra-galactic surveys will sample unprecedent- 
edly large cosmological volumes, in the order of tens of cu- 
bic gigaparsecs, by combining wide fields with deep spec- 
troscopy or photometry, typically reaching z ~ 1 (e.g. DES, 
PAU, BOSS, PanSTARRS, WiggleZQ). In addition they will 
be able to capture very faint objects and lower their shot- 
noise level to become close to sampling variance limited. Op- 
timizing the preparation and scientific exploitation of these 
upcoming large surveys requires accurate modeling and sim- 
ulation of the expected huge volume of high quality data. 
This is quite a non-trivial task, because it involves simulat- 



1 DE S Jhttp:// www.darkenergysurvey.org/ ); 
PAU ( htt p : 77 www . ice . csic .es/ pau/Survey. html) ; 
BOSS jhtFp://cosmology.lbl.gov/BOSS/|l; 
PanSTARRS (http:/ /pan-starrs. ifa.hawaii.edu); 
WiggleZ (http:/ /wigglez. swin.edu.au/ ) 



ing a wide dynamic range of cosmological distances in order 
to accurately sample the large-scale structure, and well re- 
solved dark-matter halos as a proxy to galaxy clusters, to 
model the physics of galaxy formation and other nonlinear 
physics. 

Over the past decades numerical simulations have pro- 
vided one of the most valuable tools to address these issues, 
and their relevance will certainly increase in the near future. 
They allow to follow the growth of cosmological structure, 
shed light on the process of galaxy formation, model non- 
linear effects entering different clustering measures, lensing 
and redshift distortions, track the impact of a dark-energy 
component and more. Among projects related to the devel- 
opment of very large- simulations are those carried out by 
the Virgo consortium (Frenk et al. 2000), the Millennium I 
and II (Springel et al. 2005; Boylan-Kolchin et al. 2009), 
the Horizon run (Kim et al. 2008) and the Horizon project 
(Teyssier et al. 2009). They have all benefited by the vast 
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computational power and hardware developed over the past 
years. 

In this paper we present a new effort to tackle the 
demand of large simulations and mock catalogues, the 
MareNostrum Institut de Ciencies de l'Espai (MICE) simu- 
lations, that aims at the development of very large and com- 
prehensive N-body runs to deliver an unprecedented combi- 
nation of large simulated volumes with good mass resolution. 

As a first step, we have developed two N-Body simula- 
tions including more than 8 billion particles (2048 3 ) each, in 
volumes similar and well beyond the one corresponding to 
the Hubble length (~ 30 h~ 3 Gpc 3 ), in addition to several 
other large runs of typically smaller volume and correspond- 
ing higher mass resolution that are complementary to the 
large volume runs. 

Some of the MICE simulations used in this paper have 
already been used to develop the first full-sky weak-lensing 
maps in the lightcone (Fosalba et al. 2008), or study the 
clustering of LRG galaxies with multiple-band photometric 
surveys such as PAU (Benitez et al. 2009). More recently, 
using the largest volume simulations, a series of papers has 
studied the large-scale clustering in the spectroscopic LRG 
SDSS sample, through the redshift space distortions (Cabre 
& Gaztafiaga 2009a; Cabre & Gaztanaga 2009b), the 
baryon oscillations in the 3-point function (Gaztanaga et 
al. 2008a), and in the radial direction (Gaztanaga et al. 
2008b). 

In this paper we will focus on the mass function of the 
most massive objects formed through hierarchical cluster- 
ing, since their low abundance makes the need of large sam- 
pling regions crucial. In turn, a precise description of this 
regime is of paramount importance since the abundance of 
clusters, to which it corresponds to, is very sensitive to cos- 
mological parameters (particularly the matter density), the 
normalization of the matter power spectrum and the ex- 
pansion history of the universe, characterized by the dark 
energy density and its equation of state (e.g. see Rozo et al. 
2009; Cunha 2009; Mantz et al. 2008; Henry et al. 2009; 
Vikhlinin et al. 2009 and references therein). This regime 
is also one of the best probes to search for primordial non- 
Gaussianities (e.g. Matarrese et al. 2000; Grossi et al. 2007; 
Pillepich et al. 2008; Maggiore & Riotto 2009) 

The halo mass function and related topics have been 
extensively studied in the literature. Analytic models pre- 
dicting not only the abundance as a function of mass but 
also the evolution were developed as far back as the 70's 
by (Press & Schechter 1974) and followed by (Bond et al. 
1991; Lee & Shandarin 1998; Sheth et al. 2001). How- 
ever the development of N-body simulations showed that 
these predictions were in general not sufficiently accurate 
for cosmological applications, and demanded the need for 
calibrations against numerical results (see also Robertson 
et al. 2009). Although N-body studies were likewise early 
developed (e.g. two 1000 particles simulation by Press & 
Schechter 1974), the reference work in this directions was 
set by (Sheth & Tormen 1999) and (Jenkins et al. 2001). 
More recent re-calibrations of the mass function to within 
few percent were put forward by (Warren et al. 2006; Tinker 
et al. 2008) . In addition these or other papers have focused 



their attention on the redshift evolution of the mass func- 
tion (Reed et al. 2003; Reed et al. 2006; Lukic et al. 2007; 
Cohn & White 2007) , different definitions of halo and halo 
mass (White 2001; White 2002; Tinker et al. 2008) or the 
impact of gas physics associated with halo baryons (Stanek 
et al. 2009a). 

In this paper we combine the effect of long-wavelength 
modes whose contribution can only be studied with the un- 
precedented volume of the MICE simulations (~ 30, 100 and 
450 h~ 3 Gpc 3 ), with good mass resolution and controlled 
systematics to investigate how well available fits describe 
the high-mass end tail of the halo mass function. We com- 
plement this with a nested- volume approach of N-body runs 
to probe smaller masses in a way to sample the mass func- 
tion over 5 decades in mass. 

Before proceeding we recall that in hierarchical mod- 
els of structure formation, as our present cosmological 
paradigm, halos are often found merging or accreting rather 
than isolated, and therefore have no definite boundary. 
Hence defining a halo and its mass becomes rather conven- 
tional. 

The are two widely used conventions. The spherical 
overdensity (SO, Lacey & Cole 1994) halos are defined as 
spherical regions around matter density peaks with an in- 
ner density larger than a given threshold, which is gener- 
ally taken as a fixed multiple of the critical or background 
density. Alternatively, the Friends-of-Friends (FoF, Davis et 
al. 1985) algorithm identifies all the neighbors to a given 
particle separated by less than a fixed distance (the linking 
length). The same algorithm is then applied to each neigh- 
bor recursively until no more "friends" are found. The end 
result is a group of particles in space whose boundary ap- 
proximately matches an iso-density contour (e.g. see Fig 1 
in Lukic et al. 2009). 

Each definition has advantages and disadvantages, and 
the overall convenience of one over the other is currently an 
open debate (Jenkins et al. 2001; White 2002; Tinker et 
al. 2008; Lukic et al. 2009). The abundance of FoF halos 
has been found universal at < 10% level by different studies 
including ourselves in Sec. [5] (Jenkins et al. 2001; Reed et al. 
2003; Heitmann et al. 2006; Reed et al. 2006; Lukic et al. 
2007; Tinker et al. 2008), with some dependence on the link- 
ing length (Jenkins et al. 2001; Tinker et al. 2008). Given 
the complicated non-linear processes that drive halo forma- 
tion, having a universal mass-function (i.e. independent of 
red-shift and cosmology) at this level could be very conve- 
nient alleviating the need to simulate individual cosmolo- 
gies. In contrast, the abundance of SO halos is considerably 
less universal, showing evolution in the halo mass function 
amplitude at the level of 20% — 50%, depending on cosmol- 
ogy, but also evidence for red-shift dependence of its overall 
shape (Tinker et al. 2008). The current error budget of clus- 
ter science is dominated by the unknowns in the calibration 
of mean and scatter of the mass-observable relations (e.g. 
Voit 2005; Rozo et al. 2009; Stanek et al. 2009b). Only 
after understanding these better we will be able to state 
what is an acceptable degree of non-universality in future 
precision cosmology (but see also Stanek et al. 2009a). 

In turn, the general idea of defining virialized structure 
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and mass in terms of spherical apertures is more directly 
linked to observations of galaxy groups and clusters (e.g. 
Voit 2005), in part because the scaling correlations between 
cluster observable and mass are tighter in this approach. 
Nonetheless, relating iso-density methods such as FoF is also 
possible (e.g Eke et al. 2006; Tago et al. 2008; Wen et al. 
2009). 

For concreteness, we will next focus on FoF halos and 
leave SO for a follow-up study, although we have tested to 
some extent that our main results are robust in front of this 
choice, as we further discuss in our Conclusions. 

This paper is organized as follows: In Sec. [2] we describe 
the MICE simulations. Sec. [3] recaps known theoretical pre- 
dictions and fits to the halo mass function and concludes 
with a comparison between MICE and results from previ- 
ous simulations. In Sec. [4] we discuss systematic effects that 
are most relevant in the measurement of the high-mass end 
of the mass function, such as transients from initial con- 
ditions, finite sampling of the mass distribution, and mass 
resolution effects. A detailed error analysis including differ- 
ent estimators is provided in Sec. [6] whereas in Sec. [7] we 
derive a new fitting function to account for the high-mass 
tail of the halo mass function. The higher redshift evolu- 
tion, including results regarding mass function universality, 
is the subject of Sec. [8] We discuss the implications of our 
results for dark-energy constraints in Sec[9] and we finish by 
summarizing and discussing our main findings in Sec. \W\ 

2 THE MICE SIMULATIONS 

The set of large N-body simulations described in this pa- 
per were carried out on the Marenostrum supercomputer at 
the Barcelona Supercomputing Center ( http: 77www.bsc.es [ l , 
hence their acronym MICE (Marenostrum-Instituto de 
Ciencias del Espacio). 

All simulations were ran with the Gadget-2 code 
(Springel 2005) assuming the same fiat concordance LCDM 
model with parameters f2 m = 0.25, Qa ~ 0.75, = 0.044, 
and h — 0.7. The linear power spectrum had spectral tilt 
n s — 0.95 and was normalized to yield as = 0.8 at z = 0. 
Special care was taken in order to avoid spurious artifacts 
from the initial conditions (transients). Thus, the initial par- 
ticle distributions were laid down either using the Zeldovich 
approximation with a high starting redshift or 2nd order La- 
grangian Perturbation Theory (2LPT) (Scoccimarro 1998; 
Crocce et al. 2006) (see Sec. 14. II for details). 

The main goal of the MICE set is to study the forma- 
tion and evolution of structure at very large scales, with 
the aim of simulating with enough mass resolution the size 
of future large extra-galactic surveys, such as DES (Annis 
et al. 2005) or PAU (Benitez et al. 2009), and test ro- 
bustly statistical and possible systematic errors. Fig[T]shows 
the set of MICE simulations in the mass resolution-volume 
plane. They sample cosmological volumes comparable to 
the SDSS main sample (0.1 h~ 3 Gpc 3 ), the SDSS-LRG sur- 
vey (l/i _3 Gpc 3 ), PAU or DES (9 ft" 3 Gpc 3 ), and those of 
huge future surveys such as EUCLID (~ 100 ft -3 Gpc 3 ), in 
combination with mass resolutions from 3 x 10 12 /i -1 Mo 
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Figure 1. The MICE simulations in the mass resolution-volume 
plane: they span over volumes comparable to the SDSS-main sam- 
ple (0.1 h~ 3 Gpc 3 ), SDSS-LRG survey (1 h' 3 Gpc 3 ), DES or PAU 
surveys (9 h~ 3 Gpc 3 ), and up to huge volumes such as the planned 
EUCLID mission (100 h~ z Gpc 3 ), and deliver mass resolutions 
from 3 X 10 12 h~ x Mq down to 3 X lO s /i _1 M . In turn, the 
largest volume simulations (big squares) map the mass function 
at the high- mass end, ~ 10 15 h~ 1 Mq, whereas the test simula- 
tions (small triangles) extend the dynamic range down to halos 
of 10 10 h- 1 Mq. 



down to 3 x 10 8 Mq. In turn, the largest volume simula- 
tions (squares) map the mass function at the high-mass end, 
~ 10 15 ft -1 Mq, whereas the test simulations (triangles) ex- 
tend the dynamic range down to halos of 10 10 h' 1 M . Table 
[T] summarizes the identifying parameters of the main MICE 
simulations. 

Notice that for one particular case (MICE1200) we im- 
plemented a set of 20 independent realizations, in order to 
compare statistical errors on different quantities obtained 
from an strictly "ensemble error" approach from other in- 
ternal or external error estimates. 

In addition to the production of comoving outputs at 
several redshifts, we have constructed projected density and 
weak lensing maps as well as ligth-cone outputs from the 
main MICE runs. The mass projected and lensing catalogues 
were discussed in (Fosalba et al. 2008), while the light-cone 
catalogue will be presented in future work. Note that both 
represent a huge compression factor (~ 1000), that may turn 
essential in dealing with very large number of particles as in 
our case. Further details and publicly available data can be 
found at http://www.ice.cat/mice 
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Run 



A, 



part 



J h,- 1 Mpc 



m v l h' 1 M 



'soft/'* lK P c 



1C 



MICE7680 
MICE3072 
MICE4500 

MICE3072LR* 
MICE768* 
MICE384* 
MICE179* 
MICE1200* (x20) 



2048 3 
2048 3 
1200 3 

1024 3 
1024 3 
1024 3 
1024 3 
800 3 



7680 
3072 
4500 

3072 
768 
384 
179 
1200 



3.66 x 10 12 
2.34 x 10 11 
3.66 x 10 12 

1.87 x 10 12 
2.93 x 10 10 
3.66 x 10 9 
3.70 x 10 s 
2.34 x 10 11 



50 
50 
100 

50 
50 
50 
50 
50 



ZA 
ZA 
2LPT 

ZA 
2LPT 
2LPT 
2LPT 

ZA 



150 
50 
50 

50 
50 
50 
50 
50 



Table 1. Description of the MICE N-body simulations. ./Vp ar t denotes number of particles, Lbox is the box-size, m v gives the particle mass, 
Isoft is the softening length, IC is the type of initial conditions (Zeldovich Approximation, ZA, or 2nd order Lagrangian Perturbation 
Theory, 2LPT), and z;„ is the initial redshift of the simulation. Their cosmological parameters were kept constant throughout the runs 
(see text for details), the initial global time-step is of order 1% of the Hubble time (i.e, ciloga = 0.01, being a the scale factor), and 
the number of global timesteps to complete the run N B teps ~ 2000 in all cases. We ran an ensemble of 20 different realizations with the 
parameters of MICE1200 primarily to calibrate error estimators. We mark with * those runs that were done for completeness or testing 
as main purpose. 



3 THE HALO MASS FUNCTION 



3.1 Theoretical Predictions 



The very large simulated volume spanned by the MICE set 
allow us to study accurately not only Milky Way size ha- 
los, but specially the most massive and rarest halos formed 
by hierarchical clustering. To this end we built dark matter 
halo catalogues at each snapshot of interest according to the 
Friends-of-Friends (FoF) algorithm (Davis et al. 1985) with 
linking length parameter b set in units of the mean inter- 
particle distance in each simulation. We will refer to halos 
defined in this way as FoF(b). For the most part we will 
deal with the b — 0.2 catalogues, although we have also im- 
plemented b — 0.164 for a first validation of our simulations 
against the Hubble Volume Simulation (HVS) (Jenkins et al. 
2001; Evrard et al. 2002). The HVS is one of the very few 
publicly available halo catalogue comparable in simulated 
volume to MICE. 



The halo finder algorithm was implemented using 
the FOF code publicly available at the iV-body Shop 
(http://www-hpcc.astro.washington.edu/), with some addi- 
tional modifications needed in order to handle the large 
number of particles in reasonable amount of time. The re- 
sulting halo catalogues contain not only the mass, position 
and velocity of the center of mass, and virial velocity, but 
also information of all the particles forming each halo. 



As an example of the size of our outputs we mention 
that MICE3072 contains at z — a total of about 25 million 
halos more massive than 3.9 x 10 12 h~ x Mq if the minimum 
number of particles per halo is set to 20. The most massive 
object weighs 5.27 x 10 15 h' 1 M and is made of 22, 561 
particles. In turn, MICE7680 contains about 15 million halos 
with mass greater than 7.3 x 10 13 h -1 Mq, with the biggest 
reaching 8.4 x 10 15 h' 1 M . 



Let us start by recalling some well known results regarding 
the abundance of halos. The differential mass function is 
defined as, 



M dn(M, z) 
p b din a~ 1 (M, z) 



(1) 



where n(M, z) is the comoving number density of halos with 
mass M and cr(M, z) is the variance of the linear density field 
smoothed with a top hat filter of radius R and enclosing an 
average mass M = Pb 4nR 3 /3, 



a 2 (M,z) = D 



2tt 



§t J k 2 P(k)W 2 (kR)dk, 



(2) 



with 



W(x) 



(x) 



-[sin (a;) — :rcos(:r)]. 



In Eq. ([2}, D(z) is the linear growth factor between z = 
and the redshift of interest, and P(k) the linear power 
spectrum of fluctuations at z — 0. 

In Eq. ([1]) we have explicitly assumed that all the cos- 
mology dependence of the differential mass function enters 
through the amplitude of linear fluctuations, Eq. ([2]), at the 
mass scale M. If the redshift dependence also satisfies this 
condition the halo abundance as defined by Eq. {1} is said 
to be universal (Press & Schechter 1974; Sheth & Tormen 
1999; Jenkins et al. 2001). 

Several analytical derivations (Press & Schechter 1974; 
Bondetal. 1991; Sheth et al. 2001) or fits (Sheth & Tormen 
1999; Jenkins et al. 2001; White 2002; Reed et al. 2003; 
Reed et al. 2006; Warren et al. 2006; Tinker et al. 2008) 
have been provided in the literature over the past years, 
starting from the original Press-Schechter formalism in 1974 
(Press & Schechter 1974). In our work we will refer only to 
the Sheth and Tormen (ST) fit given by (Sheth & Tormen 
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1999), 



3.3 Comparison with previous work 



/st(o-) = A\ 

V IT O 



exp 



qSV 
2a 2 



(3) 



with A = 0.3222, q = 0.707 and p = 0.3. In addition we 
will take the value of the linear over-density at collapse as 
5 C = 1.686, and ignore its weak dependence on cosmology 
(Lacey & Cole 1993; Jenkins et al. 2001). The subsequent 
Jenkins fit (Jenkins et al. 2001), 



/jenkins(a") = ^4exp [-| logo" 1 + 6|' 



(4) 



with A = 0.315, b — 0.61 and c = 3.8 corresponding to 
FoF(0.2) halos, that was obtained at redshifts z — — 5 
over the range —1.2 < In cr _1 < 1.05. For our cosmology 
this corresponds to masses (0.96 x 10 10 -4.0 x 10 15 ) ft -1 M . 
Alternatively, we will refer to his fit for FoF(0.164) halos for 
which A = 0.301, b = 0.64 and c = 3.88 (Eq. B2 in Jenkins 
et al. 2001). This is valid over the mass range (8.7 x 10 10 — 
3.4 x 10 15 ) h' 1 Mq in our cosmology. 

More recently Warren et al. 2006 performed a detailed 
mass function analysis using a set of nested-volume sim- 
ulations and provided the following values for the best-fit 
parameters of a ST-like mass function, 



/Warrcn(cr) = A [ff ° + b] exp [~ ""jl 



(5) 



with A = 0.7234, a = 1.625, b = 0.2538, c = 1.1982, obtained 
from a fit to the mass range (10 10 - 10 15 ) h' 1 M Q at z = 0. 
We will use Eq. ([5} as our bench-mark reference fit. 

3.2 The binned Mass Function 

In order to compare the predictions for the differential mass 
function in Eq. (flj with the observed halo abundance in a 
simulation of volume Ll ox one would measure the number 
of halos AN in a given mass-bin [Mi — M2] of width AM 
and characteristic mass M, and define, 



dn 



M AN 

Cam 



that is then related to the differential mass function in 
Eq. fT} after multiplying by the prefactor — u/a'pt. How- 
ever for the most part we will directly compare the number 
density of halos in a given mass bin with the prediction 
binned in the same way as the measurements. That is, from 
Eq. ([1]) we obtain the theoretical number density of objects 
per unit mass dn/dm, which is then integrated as, 



M 2 



Ml 



dn 
dm 



dM 



M 2 



Mi 



Z£lL^ f{M ,z)dM 
M a dM ' ; 



(7) 

to predict AN/V. The corresponding value of the mass of 
the bin is obtained as 

r-M 2 



Mhin 



da 
dm 



MdM 



(«) 



from the theory and as a mass weighted average, M = 
X^bin Mi/ AN, from the simulation. Throughout our study 
we used mass bins equally spaced in log-mass, with 
Alog 10 M/( hT 1 Mq) = 0.1. We have tested that our con- 
clusions do not depend on this particular choice. 



As a first validation of our set of large volume simula- 
tions we compared the halo abundance in MICE3072 to 
that in the Hubble Volume Simulation (HVS) (Jenkins et 
al. 2001; Evrard et al. 2002), since they both simulate al- 
most the same total volume. The HVS is among the biggest 
simulated volume with halo data publicly available Q at 



http:/ /www. mpa-garching.mpg.de/Virgo/hubble. html 

We used the catalogue corresponding to a A-CDM cos- 
mology and FoF halos with linking length parameter b = 
0.164 (Jenkins et al. 2001). Thus, in what follows we will 
refer to the MICE catalogues for this value of b. Finally, 
for this comparison we employed the low resolution run of 
MICE3072 (MICE3072LR, see Table □} that has a similar 
mass resolution to that in the HVS. 

Figure [2] shows the ratio of the mass functions mea- 
sured in the HVS and MICE3072LR at z = 0. The higher 
abundance of massive halos in the HVS is due mainly to its 
larger value of as, equal to 0.9 against 0.8 in MICE. There- 
fore we include in the figure the expected value for this ratio 
as predicted by the Jenkins fit in Eq. Q (or Eq. B2 in Jenk- 
ins et al. 2001). The difference between the measured ratio 
and the prediction are within the claimed accuracy for the 
Jenkins fit (10 — 15 %). Nonetheless notice that the HVS 
has a rather late start z% = 34 that leads to an artificially 
lower abundance (see Tinker et al. 2008 for a discussion on 
this). If one corrects for this effect, the ratio of data-points 
in Fig. [2] increases by about 5% at 10 15 h~ x Mq explaining 
part of the difference. We thus conclude that for matching 
volume (and particle mass) our MICE run agrees with the 
HVS. 



4 SYSTEMATIC EFFECTS 

Measurements of the high-mass end of the halo mass func- 
tion are potentially affected by a number of systematics. 
Below we investigate in detail the most relevant ones: the 
impact of the choice of initial conditions, discrete sampling 
of the halo mass profile and mass resolution effects. 



4.1 Transients from initial conditions 

Several potential sources of systematic errors must be con- 
sidered and controlled when implementing an N-body run, 
with their relevance sometimes dictated by the regime at 
which one is interested (see Lukic et al. 2007 for a detailed 
analysis). We have performed convergence test regarding 
force and mass resolution, initial time steeping, finite vol- 
ume effects, and more. But of particular relevance to the 



2 We do not include a comparison to the recently available data 
from the Horizon run (Kim et al. 2008) because of potential 
systematic issues in that simulation that are expected to affect 
the abundance of the most massive halos in a substantial manner, 
such as a low starting redshift 2; = 23 combined with the ZA (as 
discussed in Sec. I4.lt . Besides, this simulation uses a rather large 
force softening length / E = 160 kpc and low number of global 
time-steps, N B teps = 400. 
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Figure 2. Comparison to the Hubble Volume Simulation (HVS): 
We show results for the FoF halo mass functions at z = for a 
linking length parameter b = 0.164 for the HVS in ratio with 
our low resolution run of MICE3072 (see the MICE3072LR entry 
in Table 1). We note that both runs have a similar volume and 
particle mass, but they simulate different cosmologies (mainly a 
lower erg for MICE3072LR). The solid line corresponds to the 
prediction for this ratio using the Jenkins fit for FoF(0.164) in 
Eq. Q. The difference between symbols and the prediction is 
within the accuracy claimed for the Jenkins fit. In all cases we 
show only halos with no less than 50 particles, and Poisson errors. 



abundance of the largest halos at a given output is the initial 
redshift and the approximate dynamics used to set the ini- 
tial conditions and start the run (Scoccimarro 1998; Crocce 
et al. 2006; Tinker et al. 2008; Knebe et al. 2009). 

The generally adopted way to render the initial mass 
distribution is to displace the particles from a regular grid 
or a glass mesh, using the linear order solution to the equa- 
tions of motion in Lagrangian Space. This is known as Zel- 
dovich approximation (ZA). Particle trajectories within the 
ZA follow straight lines towards the regions of high initial 
overdensity. The ZA correctly describes the linear growth 
of density and velocity fields in Eulearian Space but, failing 
to account for tidal gravitational forces that bend trajecto- 
ries, underestimates the formation of non-linear structure. 
To leading order this can be incorporated using the second 
order solution in Lagrangian Perturbation Theory (2LPT) 
effectively reducing the time it takes for the correct grav- 
itational evolution to establish itself (known as transients) 
once the N-body started. During the period where transients 
are present the abundance of the most massive objects, that 
originate from the highest density peaks, is systematically 
underestimated. 

In Crocce et al. 2006 it is shown that transients affect 
the z — mass function reducing it by 5% at 10 15 /z _1 Mq 
if ZA, as opposed to 2LPT, is used to start at Zi = 49. This 
value rises to 10% for M > 2 x 10 14 h' 1 M Q at z = 1. Also 
Tinker et al. 2008 finds evidence for transients in the HOT 
runs introduced in Warren et al. 2006 and the HVS (Jenkins 
et al. 2001). These runs were started in the redshift range 
z = 24 to 35 using ZA. However their own run with Zi — 60 
is in good agreement with the 2LPT predicted abundance 
from Crocce et al. 2006 by z = 1.25. The impact of the 



starting redshift in the high redshift mass function has been 
investigated in Lukic et al. 2007; Reed et al. 2006; Reed et 
al. 2003. 

To test the significance of transients ourselves we im- 
plemented two runs of MICE1200 (L bo x = 1200 h' 1 Mpc 
and 800 3 particles) using ZA and 2LPT, both with z t = 50, 
and the same initial random phases (not listed in Table (TJ. 
Figure [3] shows the ratio of the measured mass functions, 
"-za/w2lpt, at 2: = (top panel), z = 0.5 (middle) and 
2 = 1 (bottom). Top and bottom panels show in addition the 
results obtained by Crocce et al. 2006 for the same combi- 
nation of [z, Zi] with a solid line. The dash line in the middle 
panel corresponds to a simple 2nd order polynomial fit to 
the ratio at z = 0.5. Our results agree very well with those in 
Crocce et al. 2006 despite the difference in cosmology of the 
N-body runs (most notably erg), confirming an underestima- 
tion of the halo abundance by ~ 5% at M ~ 10 15 , 3.16 x 10 14 
and 10 14 Mq at z = 0, 0.5 and 1 respectively (and larger 
for larger masses, at fixed redshift) if ZA z, — 50 is used 
instead of starting at higher redshift or using 2LPT. 

In line with the results above, almost all our runs in 
Table Q] were started using 2LPT at Zi = 50 to avoid tran- 
sients in the low-redshift outputs. The convergence of 2LPT 
with Zi ~ 50 is discussed in detail in Crocce et al. 2006. 
For MICE7680 we implemented ZA at high starting red- 
shift (z.i = 150) to minimize transients. In this case the 
convergence is assured by the results in Fig. [5] where its 
halo abundance is compared with the one in MICE4500, 
that was started with 2LPT at Zi = 50 with completely dif- 
ferent random phases (particle load and volume are differ- 
ent). The measured abundance is practically indistinguish- 
able. The only run expected to be affected by transients was 
MICE3072 that uses ZA at z 4 = 50. In what follows we will 
therefore correct the mass function measured in MICE3072 
by a simple fit to the ratios shown in Fig. [3] 

4.2 FoF Mass Correction 

As noted by Warren et al. 2006 the mass of halos deter- 
mined using the FoF algorithm suffer from a systematic over- 
estimation due the statistical noise associated with sampling 
the mass density field of each halo with a finite number of 
particles. By systematically sub-sampling an N-body simu- 
lation and studying the associated FoF(0.2) halo abundance 
(keeping the linking length parameter fixed) Warren et al. 
2006 determined an empirical correction of the mass bias 
that depends solely in the number of particles n v compos- 
ing the halo through the simple expression, 

corr /-i 0.6\ /n\ 

n p =n p (l — n p ). (9) 

However, as remarked by Lukic et al. 2007, the correction 
should be checked in a case-by-case basis since it is not the 
result of a general derivation (see also Tinker et al. 2008). 
While it is true that for well sampled halos the correction 
is relatively small (e.g. 2.5% for halos with 500 particles) 
the impact that a few percent correction to the mass has in 
the halo abundance can be large is one refers to the most 
massive halos living in the rapidly changing high-mass tail 
of the mass function, as we are investigating in this paper. 
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Figure 3. Transients in the mass function: ratio of mass func- 
tions measured at z = (top panel), z = 0.5 (middle) and 2 = 1 
(bottom), for N-body runs similar to MICE1200 (see Table [TJ 
started at z; = 50 using either ZA or 2LPT to set-up the initial 
conditions. The solid line at z = 0, 1 panels displays the result of 
a comparable work done by Crocce & Scoccimarro (2006), but for 
a different cosmology and mass resolution. Middle panel shows a 
2nd order polynomial fit to the ratio "-2LPT />iz A ■ Clearly, the 
approximate dynamics and starting redshift used to set-up initial 
condition plays a substantial role in the abundance of the rarest 
halos. 
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Figure 4. FoF Mass Correction: We tested the correction to 
the systematic mass over-estimation intrinsic to the FoF algo- 
rithm (Warren et al. (2006)) in all of our dynamic range, but 
with particular emphasis at the very high-mass end (i.e. using 
MICE7680 and M1CE4500). We randomly selected one every n 
particles (ra = 2, 4, 8 in red, blue and green respectively) and ran 
the FoF algorithm afterwards with a linking length n 1 / 3 larger. 
The figure shows the corresponding ratio to the fully sampled 
mass function before (top panel) and after (bottom) the correc- 
tion, for the cases of MICE7680 and 768. Notably, in all of our 
simulations the simple expression in Eq. (J9j brings the full and 
sub-sampled mass functions into agreement. 



For this reason we have carried out an independent 
check of the correction in Eq. ([9]), with particular emphasis 
in the regime M > 10 13 - 14 hT 1 M Q , where the mass function 
is exponentially suppressed. 

We randomly sub-sampled every simulation in the 
MICE set to several degrees (1 every n = 2, 4 and 8 par- 
ticles) and run the FoF algorithm afterwards keeping the 
linking length parameter b — 0.2 fixed (i.e. with the link 
length n 1//3 larger in each case). Results are shown in Fig. 2] 
for two representative cases, MICE768 and MICE7680, but 
they extrapolate to all others in Table [1] 

The correction in Eq. ((9]l is able to bring the sub- 
sampled mass functions into agreement with the original 
fully sampled one over the whole dynamic range (up to 
4 x 10 15 h~ x M ). Most notably in the case of MICE7680 
whose particle mass and volume makes it sample the mass 
function exponential tail with low Poisson shot-noise but 
with halos of no more than ~ 2300 particles, what makes it 
very sensible to such mass corrections. Finally, we have also 
tested that varying the factor 0.6 leads to worse matching. 

Hence, in what follows we will refer to the abundance 
of mass corrected FoF halos, unless otherwise stated. 



4.3 Mass Resolution Effects 

For a first glimpse of the abundance of massive objects in 
MICE, we display in Fig. the mass function of FoF(0.164) 
halos obtained from our largest runs (in terms of simulated 
volume), including the corrections for mass and abundance 
discussed above in Sees. (14.1I4.2[) . 

MICE3072 (blue squares) is in good agreement with 
the Jenkins prediction over more than two decades in mass, 
overlapping with the results from the larger MICE7680 
(green circles) and MICE4500 (red triangles) for M in 

10 14-15 h -l M@ 

However, as we transit towards the high-mass end 
(M^ 10 15 hT 1 M©) the abundance in the grand sampling 
volume of MICE7680 rises over the one in MICE3072 (and 
HVS) reaching a 20% difference. In addition, measurements 
in MICE4500 (red triangles) are in very good agreement 
with those in MICE7680 even though these runs correspond 
to completely different initial conditions, softening length, 
box-size, etc (see Table [1]). 

We recall that MICE7680 and MICE4500 have roughly 
the same mass resolution to that in the HVS, but a volume 
16.7 and 3.4 times larger, respectively. In turn, MICE3072 
has roughly the box-size of the HVS, but 8 times better mass 
resolution. 

To check that the "excess" abundance at large masses in 
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Figure 5. Mass Resolution effects on the high-mass end: We 
show the abundance of FoF(0.164) halos at z = in MICE3072 in 
blue squares, MICE4500 in red triangles and MICE7680 in green 
circles. We find a systematic rise over the Jenkins prediction for 
M £ 10 15 h~ 1 M (S . The low-resolution simulation MICE3072LR 
(solid line) and the very-low resolution simulation MICE3072CR 
(dashed line) evidence that our results are robust to mass resolu- 
tion effects at large masses. 



not an artifact due to poor mass resolution we have included 
in Fig. ((5} the mass function measured in MICE3072LR 
and in a very-low mass resolution run not listed in Ta- 
ble ED (L box = 3072/ ! .~ 1 Mpc, N p = 512, and m p = 
1.5 x 10 13 h' 1 M s ). They both agree remarkably well with 
MICE3072 at M > 1O 15 /i _1 M , showing that the abun- 
dance we found using MICE7680 and MICE4500 is robust 
to mass resolution effects, once the Warren et al. 2006 cor- 
rection is taken into account. 

In summary, we have tested the robustness of our re- 
sults in front of several possible systematic effects. Early 
starting redshifts or inappropriate initial dynamics can af- 
fect the high-end mass function at the several % level. Hence 
the MICE simulations used either the 2LPT dynamics, or 
ZA with high-starting redshift [z ~ 150), with the exception 
of MICE3072 whose mass function we nonetheless correct 
as described in Sec. 14.11 In addition halo masses were com- 
puted using the correction of Warren et al. 2006, that we 
independently tested focusing in the regime of very massive 
halos, in order to avoid a bias towards higher masses at low 
particle number. Lastly, we studied whether the rather high 
particle mass of MICE7680 and MICE4500 can impact the 
abundance of massive halos by implementing a set of runs 
with fixed (large) volume and decreasing mass-resolution. 
The abundance of massive halos in these runs are in very 
good agreement once the Warren et al. 2006 correction is 
considered, reinforcing the robustness of our mass function 
measurements in front of mass resolution effects. 



5 THE ABUNDANCE OF FOF(0.2) HALOS 

Let us now turn to the mass function measurements in our 
catalogues of FoF(0.2) halos. Figure [5] shows the measured 



mass function in the MICE simulations tabulated in Ta- 
ble [1] We display the ratios to the Sheth & Tormen fit 
in Eq. (J3J, binned in the same way as the measurements. 
Top panel corresponds to masses corrected for the FoF(0.2) 
bias as described in Warren et al. 2006 and discussed in 
Sec. 14.21 Eq. @. Bottom panel contains un-corrected mass 
functions. In both panels the solid line represents the War- 
ren fit given in Eq. ([5}, while dashed corresponds to Jenk- 
ins fit in Eq. @. The corrected mass functions agree very 
well with the Warren fit, but only up to 10 14 ft -1 Mq. Past 
that mass there is a systematic underestimation of the halo 
abundance in MICE768 and MICE3072 that reaches 20% 
at M ~ 5 x 10 14 h~ x Mq (notice that we show only points 
with relative Poisson error < 5%) . Part of this effect can be 
attributed to transients in the simulations used by Warren 
et al. 2006 to calibrate the high-mass end, as discussed in 
(Crocce et al. 2006; Tinker et al. 2008). 

For larger masses, M ^ M 15 /i _1 Mq, the underes- 
timation of the Warren fit is even more severe, and 
grows monotonously with M. This in part might be due 
to volume effects: the abundance of halos at the high- 
mass end is expected to be extremely low, of order 
nwo//i" 3 Mpc 3 < 10" 7 (z = 0), 10~ 8 (z = 0.5), and 
10" 9 (z = 1) at 10 15 /i _1 M© (integrated over mass bins 
of Alog 10 M = 0.1). This means that already at moder- 
ate redshifts, z — 0.5, a simulation of Lbox = 3/i _1 Gpc 
will contain only about 300 halos and therefore measuring 
the abundance of halos will be the subject to large un- 
certainties, i.e, the expected (shot-noise) error will be al- 
ready of order 6%. By including larger volume simulations, 
such as MICE4500 {L box = 4.5/i _1 Gpc) and MICE7680 
(Lbox = 7.68 h~ l Gpc), we are able to increase the number 
of halos by up to a factor ~ 16, thus decreasing the as- 
sociated halo abundance uncertainties by a factor of ~ 4. 
As shown in Fig[6] (top panel) results from both large vol- 
ume simulations (MICE4500 & MICE7680) agree very well 
in the high-mass end (M > 3 x 10 14 h~ x Mq). This agree- 
ment serves as a validation test for the implementation of 
each of them as well as for the high-mass end result, given 
that these two simulations share the same particle mass but 
have different initial dynamics (2LPT vs. ZA) and random 
phases. 



6 ERROR ESTIMATION 

In a rather general sense the most common source of sta- 
tistical error considered in theoretical studies of halo abun- 
dance is solely the shot-noise contribution (e.g. Reed et al. 
2006; Warren et al. 2006; Lukic et al. 2007; Jenkins et 
al. 2001). The importance of considering sample variance 
in addition to Poisson shot-noise had been highlighted in 
(Hu & Kravtsov 2002), where it is shown that it can not 
be neglected in front of shot-noise for deriving precise cos- 
mological constrains. Following this criteria, Tinker et al. 
2008 recently used jack-knife errors with the intention to 
account for both sampling variance at low mass and Poisson 
shot-noise at high ones. 

To deepen into these considerations we will dedicate 
this section to perform a detailed study of different methods 
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to estimate the error or variance in mass function measure- 
ments. One particular goal is to obtain well calibrated errors 
in order to implement an accurate fit that could improve the 
high- mass description of Eq. ([5} ■ 

We will pay particular attention in comparing how in- 
ternal errors (i.e. those derived using only the N-body for 
which the mean mass functions is measured, such as jack- 
knife) perform against external ones and theoretical predic- 
tions, depending on the mass regime and total simulated 
volume under consideration. 

One of the internal methods that we implemented is 
Jack-knife re-sampling (Zehavi et al. 2005). For this we di- 
vided the simulation volume under consideration into Njk 
non-overlapping regions, and computed the halo number 
density in the full volume omitting one of these regions at 
a time. The variance (denned as the relative error squared) 
in the i-bin of the number density is then obtained as, 

, , r , N .JK 

(i) 2 _ 1 Nj K -I sp ( (i) n (Qs2 (1Q) 

where n (i) is the mean number density of halos for that bin. 
In what follows we will show results using Njk = 5 3 , but 
we have checked that the estimates have already converged 
with varying iVjK- 

Another internal method we considered was to assume 
that the halos are randomly sampled and form a Poisson 
realization of the underlying number density field. In this 
case, 

4 } L m = v*< (ii) 

where Ni is the number of halos in the i mass bin. 

For estimating the variance externally in a volume V, 
we used an N-body of volume Vl, with Vl » V . We then 
divide Vl into several non-overlapping regions of volume V 
and measure the number density in each sub-volume. This 
method, which we refer to as sub-volumes, is similar in spirit 
to boost-trap sampling except that the sub- volumes are not 
thrown at random and do not overlap. Thus, this method has 
the advantage of incorporating the effect of long-wavelength 
modes which are absent in the volume V . 

For example, for mass function errors in MICE179 we 
divided MICE384 in 8 sub- volumes and MICE768 in 80 sub- 
volumes. In this way, the best statistics for the error is 
achieved at the mid-to-high mass regime of MICE179 be- 
cause the mass resolution of MICE768 does not allow to 
test all the way down to M ~ 3.16 x 10 11 h' 1 M (although 
MICE384 does). Nonetheless both MICE384 and MICE768 
leads to a very consistent error estimation in MICE179 for its 
whole dynamic range, showing no dependence on mass reso- 
lution. For MICE384 we divided MICE768 in 8 sub-volumes 
and MICE3072 in 512 sub-volumes. The rest of box-sizes 
follow this same logic, that is, their variance in the mean 
number density was obtained from analyzing the next-in- 
volume runs as listed in Table [T] 

Our last external method is ensemble average. This we 
can only apply to one box-size, Lbox = 1200 h" 1 Mpc, using 
the ensemble of 20 independent realizations of MICE1200 as 
listed in Table [TJ 



Finally, to derive a theoretical estimate of the vari- 
ance in the measured mass function consider fluctuations in 
the mean number density of halos of a given mass, nh(M), 
as coming from two different sources (see Hu & Kravtsov 
2002 for the original derivation). Firstly, a term arising 
from fluctuations in the underlying mass density field S m , 
if we consider the halo number density to be a tracer of 
the mass. If this relation is simply linear and local then 
8nh{M ,yi) / nh = fe(Af)5 m (x), where b is the halo bias. 

Secondly, a shot-noise contribution 5n sn due to the im- 
perfectness of sampling these fluctuations with a finite num- 
ber of objects. This noise satisfies (8n sn ) = and is assumed 
to be un-correlated with 8 m . Furthermore, if we assume the 
halo sample to be a Poisson realization of the true number 
density this error becomes a simple Poisson white-noise with 
variance (8n 2 n ) /n\ = 1/fihV = 1/N, where iV is the total 
number of objects sampled within the volume V. Within 
these assumptions we then have, 

5n h (M,x) = b(M)n h (M)5 m (x) + Sn sn . (12) 

The number density of objects of this mass within the sim- 
ulation is estimated by, 

n= ( d 3 xW(x) [n h + 8n h ] , (13) 
Jv 

where W(x) is the simulation window function, normalized 
such that f d 3 xW(x) — 1. The variance of the number 
density measurements in the simulation is then given by, 

(n 2 }-n 2 h = ^- + b 2 n 2 h jd 3 x. l jd 3 x J W{x l )W(x J ) 

x(<5 m (xi)<5 m (x 3 -)), (14) 

and can be cast as, 

al = <nVn| = 1 + b2 J dp* mkRtmi (15) 
nl nhV J (27r)' 5 

where P(k) is the linear power spectrum of mass. For sim- 
plicity, we will assume the simulation window function W 
to be top-hat in real space, Eq. {3), with smoothing radius 
such that the window volume equals the simulated one, i. e. 
R= (3V/Atv) 1/3 . 

The first term in Eq. (|14fl is the usual shot-noise contri- 
bution to the variance, that we introduced rather had-hoc in 
Eq. (|12p . but it can also be derived in the context of the halo 
model as the contribution from the 1-halo term (Takada & 
Bridle 2007) . The second term, know as sampling variance, 
is the error introduced by trying to estimate the true number 
density using a finite volume. 

As discussed in Sec. 13.21 one is in practice interested in 
the halo abundance within bins of mass range [Mi — M2] and 
characteristic mass M. Thus in Eq. (|15[) we will compute M 
and fih from Eqs. (I7I8P and the bias as, 

1 f M 2 

b h = — b ST (M)(dn/dm)dM, (16) 

n h J Ml 

where bsr is the prediction for the linear bias dependence 
on halo mass from Sheth & Tormen 1999, 

h <M\ -1 1 qSc/o- 2 - 1 2p/8 c , 7 v 

bsT(M ) - 1 + + 1 + m/(j2r (17) 
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with parameters q — 0.707 and p = 0.3 and a — a{M) given 
in Eq. ((2)1 . Equation (117[1 follows from considering variations 
of the unconditional mass function in Eq. ([3jl with respect to 
the critical over-density for collapse 8 C . Thus, strictly speak- 
ing this bias expression should be weighted by dn/dm from 
Eq. © when integrating Eq. |]U (Sheth & Tormen 1999; 
Manera et al. 2009). However we have found that using a 
fit to the MICE mass function instead leads to better agree- 
ment with measurements of clustering in the simulations. 
Accordingly we will use this fit also to estimate M and fin, 
entering in Eq. (|15[) . 

Figure [7] shows the result of our external, internal and 
theoretical error study. Clearly the Poisson shot-noise dom- 
inates the error budget for M Si 10 14 hT 1 Mq correspond- 
ing to omf/MF > 5% (and logcM 1 < 0.06). At smaller 
masses the sampling variance becomes increasingly impor- 
tant rapidly dominating the total error (this is more so 
for smaller box-sizes). Jack-knife re-sampling does capture 
this trend but only partially, in particular for the smaller 
box-sizes ( ^ 500 /i~ 1 Mpc) where sampling variance from 
the absence of long-wavelength modes is more significant. 
This seems to indicate that the jack-knife regions must 
have a minimum volume (e.g. while jack-knife works well 
at 10 13 h' 1 M Q in MICE768, it does not for MICE179 at 
the same mass). This is an important result to consider in 
further studies where jack-knife re-sampling is used to im- 
prove upon Poisson shot-noise. The total error can be under- 
estimated by a factor of a few, e.g. 3 (1.5) at 10 12 /i _ 1 Mq 
(10 13 ) in MICE179. 

On the other hand, the theoretical error from Eq. (|15p 
is in remarkably good agreement with the sub-volumes 
method described above across all box-sizes (not shown in 
MICE7680 and MICE3072 panels of Fig. because is indis- 
tinguishable from the other estimates). This can be taken 
as a cross-validation of these two methods. 

In addition we have tested how these different meth- 
ods compare with the ensemble error in the mass function 
obtained from the 20 independent runs of MICE1200. For 
the sub- volumes estimation we divided MICE7680 in 264 re- 
gions of volume almost identical to that of MICE1200 (as 
well as MICE3072 in 18 regions for consistency checks) . The 
result is that, for Lt ox — 1200 h Mpc, the sub-volume er- 
ror is larger than the ensemble one by a factor of about 
20% at M ~ [10 13 - 3.16 x 10 14 ] /i _1 M . The reason is 
that each sub-volume region "suffers" fluctuations in the 
mean density caused by the long-wavelength modes present 
in the larger box-size from which they have been obtained 
(MICE7680 or MICE3072 in this case). These modes, that 
introduce an extra-variance, are absent in each of the en- 
semble members that satisfy periodic boundary conditions 
at the scale Lt ox = 1200 h" 1 Mpc. Thus, the ensemble error 
(from running different simulations) does not always work 
well because it can suffer from volume effects. 

This conclusion can be nicely reinforced using Eq. (115[) . 
that performs very well here also. One can mimic the absence 
of long-wavelength modes by setting the low-fc limit in the 
sampling variance integral in Eq (|15p to be the fundamental 
mode of MICE1200 (kf = 2tt/1200 ftMpc" 1 ). In this case 
the theoretical model agrees with the ensemble error (in fact, 
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Figure 6. The MICE mass function at z=0, measured after com- 
bining data from the set of MICE simulations with box-sizes 
L b ox = 179, 384, 768, 3072, 4500 and 7680 h^ 1 Mpc (in red, green 
blue, magenta, sea-green and black respectively) and varying mass 
resolutions (see Table[T]for further details). Top panel corresponds 
to the measured mass function after applying the correction to 
the FoF mass as described in Warren et al. (2006). In the bottom 
panel we do not include this correction. In both figures we display 
the ratio to the Sheth and Tormen (1999) prediction and include 
the corresponding Warren and Jenkins fits for reference (solid and 
dashed lines). In each case the low-mass end is set by requiring a 
minimum of 100 particles per halo while the high-mass by requir- 
ing a relative error below 5% (displayed error bars correspond to 
Poisson shot-noise). 



upturns to be mostly dominated by shot-noise). Instead, by 
setting it to kf ~ one recovers the sub- volumes estimate. 

In summary, the sub- volume method should be consid- 
ered as the one comprising all statistical uncertainties: shot- 
noise, sampling variance and volume effects (the fact that 
there are fluctuations in scales larger than the sample size). 
The theoretical estimate in Eq. (|15|l is consistent with it to 
a remarkably good level, for the masses and box-sizes tested 
in this paper. Hence, is a powerful tool for studies involving 
the abundance of massive halos (Rozo et al. 2009; Vikhlinin 
et al. 2009). 



7 THE FITTING FUNCTION FOR THE 
ABUNDANCE OF MASSIVE HALOS 

The accurate sampling of the mass function requires a de- 
manding combination of very big volumes and good mass 
resolution. As shown in Fig[T] using the MICE set of simu- 
lations we have sampled volumes up to 450 ft -3 Gpc 3 with 
a wide range of mass resolutions yielding a dynamic range 
10 8 < M/(h~ 1 M Q ) < 10 12 in particle mass (see also Ta- 
ble ©. 

As shown in Fig.JU] the ST and Warren fits underpredict 
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Figure 7. Sampling variance vs. shot-noise: The panels show dif- 
ferent estimates for the variance in the halo mass function within 
each of the 6 box-sizes used throughout this paper. Dot symbols 
correspond to Poisson shot-noise and is only shown for halos with 
a minimum of 200 particles and until the relative error reaches 
10% (as used in Sec. [7J. Dot-dashed blue line is the result of im- 
plementing jack-knife re-sampling in the given box-size. Solid and 
dashed black lines are the sub-volumes method described in the 
text that uses independent sub-divisions of larger-size boxes. For 
this method we include cases in which two larger volume runs 
are used (panels of MICE179, MICE384 and MICE1200). Empty 
magenta squares are the theoretical estimate in Eq. I I15H that 
includes both sampling and shot-noise variance. The variance is 
shot-noise dominated roughly for M > 10 14 h~ 1 M (depend- 
ing on the box-size). While jack-knife does capture some sam- 
pling variance at smaller masses, it is not fully satisfactory. It can 
under-estimated the error by a factor of 2 - 3 at 10 12 h' 1 Mq. 
Notably, the sub-volumes and theoretical estimates are in very 
good agreement across all box-sizes and full mass range. 



the abundance of the most massive halos found in our N- 
body simulations for M > 10 14 h~ x M , although Warren 
gives accurate results for lower masses. 

In this section we derive new fits based on the MICE 
set of simulations, sampling the mass function over more 
than 5 orders of magnitude in mass, and covering the red- 
shift evolution up to z — 1. For this purpose we use a 
set of MICE simulations with increasing volume and cor- 
responding decreasing mass resolution, in order to sample 
the mass range from the power-law behavior at low masses, 
M ~ 10 10 hT 1 Mq and up to the exponential cut-off at the 
high- mass end. As discussed in Sec.[5j the abundance of ha- 
los at the high-mass end is expected to be extremely low, 
of order n halo / h~ 3 Mpc 3 < 10" 7 (z = 0), 10" 8 (z = 0.5), 
and 10 -9 (z — 1) at 10 15 h~ M (integrated over mass 
bins of Alog 10 M = 0.1), and thus we shall use MICE4500 




12 13 14 15 
Log[M/(h-'M )] 



Figure 8. Mass function fit at z=0. Symbols as in Fig. \E\ Mea- 
surements sample the mass function over more than 5 orders of 
magnitude, 4 X 10 15 > A///i~ 1 M > 2 X 10 10 . Results are ra- 
tioed to the ST fit. The best-fit to the N-body measurements 
("MICE" fit, solid line) is given by the Warren-like mass func- 
tion, Eq (2), with parameters as given in Table [2] The fit agrees 
with N-body data to 2% accuracy in practically all the dynamic 
range (2.5 X 10 15 > M/h' 1 Mq > 2 x 10 10 ). The Warren fit 
(dashed line) matches the N-body to 3% accuracy in the low 
mass end M/ h~ 1 M < 10 14 , but it significantly underestimates 
the abundance of the most massive halos: we find a 10% (25%) 
underestimate at M/ Mq ~3x 10 14 (10 15 ), and larger biases 
for more massive objects. 



(L box = 4.5 bT x Gpc) and MICE7680 (L box = 7.68 ft -1 Gpc) 
in order to get a more accurate measurement of the halo 
abundance in this regime. In practice, we shall combine clus- 
ter counts from both simulations to get a more robust esti- 
mate. As it will be shown below, our fitting functions recover 
the measured mass function over the entire dynamic range 
and its redshift evolution with ~ 2% accuracy. 

For our fitting procedure we use the following simula- 
tions: MICE179,384, 768, 3072, 4500 and MICE7680 (see 
Fig[T] and TablsTTJ and match them so that in overlapping 
mass bins we shall adopt the abundance estimated from the 
simulation with the lower associated error, provided halos 
include a minimum of 200 particles, except for the smallest 
box-size simulation, MICE179, for which we use down to 50 
particle halos. This is done in order to sample small enough 
halos (as small as 10 10 M ) whose abundance has been 
accurately measured in previous analyses (see e.g, Warren et 
al. 2006; Tinker et al. 2008) and that we aim at recovering 
as well with our fitting functions. 

On the other hand, for M ^ 3 X 10 14 we average results 
from both MICE4500 and MICE7680. As shown above, for 
z = 0, results for M > 10 15 /i" 1 M from MICE7680 are 
found to be in full agreement with those of MICE4500 de- 
spite the different initial conditions and time-step size used, 
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A 


a 


b 


c 


X 3 /" 





0.58 


1.37 


0.30 


1.036 


1.25 


0.5 


0.55 


1.29 


0.29 


1.026 


1.20 



Table 2. Mass function best-fit parameters for f(a,z), Eq. J5]l, 
and goodness of fit. 



what provides a robustness test to our measurements at the 
highest mass bins. 

The fit to the mass function is then determined using a 
diagonaf \ 2 analysis, 

N <n {i) -n W ) 2 
X 2 = E (o* (18) 

z — 1 

where n^ t (n^ body ) is the theoretical (N-body) mass func- 
tion integrated over the i-th logarithmic mass bin of width 
Alog lo M/(/i _1 M ) = 0.1, and the errors cr (l) are com- 
puted using the Jack-knife (JK) estimator. The JK estima- 
tor is consistent with theoretical errors and sub- volumes dis- 
persion (see Fig. [7}, except for the smallest mass bins (sam- 
pled by MICE179, see lower right panel in Fig. [7} for which 
the JK errors significantly underestimate other error esti- 
mates. By using JK errors for those bins we just give them 
a larger statistical weight in the \ 2 analysis, thus making 
sure the fit recovers the expected low mass behavior (i.e we 
assume something similar to a low-mass prior). 

We summarize the fitting results for the MICE mass 
functions at z = and 0.5 in Tabled The best-fit to the N- 
body measurements at z — ("MICE" fit, solid line) is given 
by the Warren- like mass function, Eq. ((SJ), with parameters 
A = 0.58, a = 1.37, b = 0.30, c = 1.036 with ^ jv = 1.25. 
As shown in Fig. [8] the fit recovers our N-body data to 
2% accuracy in practically all the dynamic range, that is, 
for 2.5 x 10 15 > M/ih,- 1 M ) > 4 x 10 10 . The Warren fit 
(dashed line) matches the N-body to the same accuracy in 
the mass regime M < 10 14 /i _1 Mq, but significantly un- 
derestimates the abundance of the most massive halos: we 
find a 10% (30%) underestimate at M = 3.16 x 10 14 h' 1 M 
(10 15 h" 1 M ). This can be attributed in part to transients 
in the (Warren et al. 2006) N-body data, as discussed in 
Seed 

At higher-redshifts, the mass function deviates from the 
universal form in Eq. (JS| , and the fitting parameters change 
accordingly. In particular, for z = 0.5 we find the best-fit 
values A = 0.55, a = 1.29, b = 0.29, c = 1.026, yielding a 
X 2 jv = 1.20. As seen in Fig. [5] (left panel), this fit recovers 
the N-body measurements to 2% accuracy in all the dynamic 
range (i.e., for 10 15 > M/ih^M®) > 2 x 10 10 ). The MICE 
fit at z = extrapolated with the linear growth to z = 0.5 
(z = 1) overestimates the measurements by 3 — 6% (10%) as 
the halo mass increases. This in turn shows to what extent 
the FoF(0.2) mass function deviates from universality. 



8 HALO GROWTH FUNCTION 

In the previous section we found that the mass function devi- 
ates significantly from universality (or self-similarity). Here 
we investigate in detail the halo growth function, i.e, the evo- 
lution of the halo abundance with redshift, using the scaling 
evolution of the best-fit parameters as a starting point. 

The evolution of the fitting function parameters with 
redshift, as shown in Fig. [9] indicate that the FoF(0.2) mass 
function, at least for the linking length I = 0.2, is non- 
universal at the 5% — 10% level in the red-shift range tested. 
This is in agreement with previous studies (Jenkins et al. 
2001; Reed et al. 2003; Reed et al. 2006; Lukic et al. 2007; 
Tinker et al. 2008) but is consistently extended here to the 
high-mass regime, hard to sample robustly particularly at 
higher red-shifts. 

To account for this evolution we next try to fit the mass 
function growth with a simple ansatz. If we follow (Tinker 
et al. 2008) and assume that the fitting parameters are a 
simple function of the scale factor, a = 1/(1 + z), we can 
model the evolution as, 

P{z)=P{0)(l + z)- a * ;P = {A,a,b,c} ; a, = {«!,••• ,a 4 } 

(19) 

where P(0) are the fitting parameters at z = 0, as given 
by Table [2] Therefore, we can use the lowest redshift mea- 
surements at z = and z = 0.5 to determine the slope 
parameters aj, 

a a = 0.13, a 2 = 0.15, ay = 0.084, a 4 = 0.024. (20) 

If the ansatz is correct, i.e, the growth of the mass function 
can be modeled to a good approximation with Eq. (|19p . 
one should be able to predict the measured cluster evolution 
at higher redshifts. Using the values of cti as given above, 
we predict the following fitting parameters at z = 1: A = 
0.53, a = 1.24, b = 0.28, c = 1.019, what gives a good 
match to simulations, \ 2 l v ~ 1-92. As shown in both panels 
of Fig. the scaling ansatz recovers the measured mass 
function to 3% accuracy in most of the dynamic range (i.e, 
for 3.16 x 10 14 > M/{ h' 1 M@) > 2 x 10 10 ). 

We conclude from this that the ansatz Eq. (|19[1 can be 
safely used to make predictions about the abundance of the 
most massive halos at intermediate redshifts. 

It has been argued by (Tinker et al. 2008) that the 
non-universality of the mass function is basically a conse- 
quence of the evolution of the halo concentrations, which in 
turn is mostly due to the change of the matter density Q m 
with redshift and thus f(cr) should be rather modeled as a 
function of the linear growth rate of density perturbations, 
D(z). In order to test this hypothesis, we have repeated the 
analysis where the scaling of /(cr) is parametrized as follows, 

P(z) = P(0)(D(z)/D(Q)f< ; fa = {/?!,••• ,P4 (21) 

In this case the slope parameters are found to be, Pi — 
0.22, p 2 = 0.25, p 3 = 0.14, p A = 0.04. Using this model we 
estimate f(cr) parameters at z = 1 to be, A — 0.52, a = 1.22, 
b = 0.28, c = 1.017, what provides a slightly worse fit to the 
the N-body measurements, with x 2 l v ~ 2.85. Therefore, we 
find some evidence in favor of a scaling ansatz based on the 
scale factor with respect to that based on the growth rate. 
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Figure 9. Left: Mass function fit at z=0.5. The best-fit to the N-body measurements is given by Eq. l[5j with parameters as given 
in Table [2] (solid line). This fit matches simulations to 2% accuracy in all the dynamic range (3 X 10 14 > M/h' 1 Mq > 2 x 10 10 ). 
Assuming a self-similar form of the mass function (i.e, using the N-Body fit at z = extrapolated with the linear growth to z = 0.5), one 
overestimates the measurements by 3 — 6% as the halo mass increases (see dashed line). The (self-similar) Warren fit yields an even larger 
overestimate of the halo abundance (3 — 10%). This shows to what extent the FoF(0.2) mass function deviates from universality. Right: 
Mass function fit at z=l. Growth of the mass function with redshift can be accurately modeled with a simple ansatz, Eqs, l|19| l. l|20| l. l|22|l 
(see discussion below, in Sec|8]l, and it provides a good fit to simulations. We find that deviations from universality of the halo abundance 
increase with redshift: the self-similar MICE fit exceed by up to 10% the simulation measurements (and similarly for the Warren fit). 



We note that our analysis is of limited validity since we 
have only considered one cosmology and one should explore 
a wider parameter space to draw stronger conclusions on 
this point. 

In summary, the fit to the FoF(0.2) halo mass function 
measured in the MICE simulations between redshift and 
1 is given by, 



/mice (a, z) = A(z) \o a(z) + &(z)J exp 



c(z) 



(22) 



with A{z) = 0.58(l + z)~ 



a(z) = 1.37(l + z)" 



b{z) 



0.3(1 + , 



c(z) = 1.036(1 + z) 



We can now explore how the halo growth function 
evolves in more detail. For this purpose, we study the 
halo mass function integrated in wider logarithmic bins 
(Alog 10 M/( h^ 1 Mq) = 0.5), and concentrate on the high- 
est mass bins where we find the largest deviations between 
our N-body measurements and available fits. Figure [TOl (left 
panel) shows the halo growth factor as measured in several 
comoving redshifts in the MICE simulations. 

Our simulations show that the abundance of mas- 
sive halos drops by half (one) order of magnitude for the 
logM/th^Mo) = 13.5 - 14 (14.5 - 15) from z=0 to z=l, 
in rough agreement with analytic fits. However, as displayed 
in the right panels of Fig. 1101 the (self-similar) ST and War- 
ren fitting functions (see short and long-dashed lines respec- 
tively) only match the high-mass end of the measured mass 
function at z=0.5 to 15% accuracy. The Warren fit at z=l 



underestimates simulation data by up to 30%. On the other 
hand, using the predicted halo abundance growth from the 
MICE fits at low redshift (red solid line) recovers the mea- 
sured abundance at z=l to better than 1%. We have used 
the scaling functions given by Eq. (|19p . however we have 
checked that our results do not change significantly if we 
use the growth rate ansatz instead, Eq. (|21[) . 



9 COSMOLOGICAL IMPLICATIONS: BIAS ON 
DARK-ENERGY CONSTRAINTS 

The cluster mass function is one of the standard cosmolog- 
ical probes used by current and proposed surveys to con- 
strain cosmological parameters. In particular, the cluster 
abundance as a function of cosmic time is a powerful probe 
to determine the nature of dark-energy. However usage of 
the cluster abundance as a cosmological probe is limited by 
systematics in the mass-observable relations and the poten- 
tial impact of priors (see e.g, Battye & Weller 2003; Weller 
& Battye 2003). 

Here we concentrate on the impact of priors in the mass 
function on extracting the dark-energy equation of state, w. 
As shown in [8J the halo mass function measured in simula- 
tions deviates form self-similarity by as much as 15%, de- 
pending on redshift and halo mass. We shall estimate how 
this systematic departure from universality can potentially 
bias estimates of w. 
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Figure 10. Halo Growth Function: evolution of the halo abun- 
dance with redshift. We show the three most massive logarithmic 
bins (with AlogM/( Mq) = 0.5), where we find the most sig- 
nificant deviations between N-body and available fits. Left panel: 
the abundance of massive halos drops by half (one) order of mag- 
nitude for the logM/{h- 1 M ) = 13.5 - 14 (14.5 - 15) from 
z=0 to z=l, in rough agreement with analytic fits. Right panel:. 
Residuals between analytic fits (lines) to N-body measurements 
(symbols) for the same mass bins (increasing mass from top to 
bottom panels). ST and Warren fitting functions (short and long- 
dashed lines respectively) only match the high-mass end of the 
measured mass function at z=0.5 to 15% accuracy. The extrapo- 
lated Warren fit to z=l underestimates MICE data by up to 30%. 
The predicted halo abundance growth from the MICE fits at low 
redshift (red solid line; see text for details) recovers the measured 
abundance at z=l to better than 1%. 



As a working case, we will consider a tomographic sur- 
vey with photometric accuracy Az, and estimate the shift 
in the recovered value of, w, by including cluster counts in 
redshift shells up to a given depth, z. For simplicity we con- 
sider a constant dark-energy equation of state, although this 
same analysis could be easily extended to a time varying 
w(z). Since the wrong prior on the halo mass function can 
be mistaken by the right cluster abundance for a different 
cosmology, the bias on w, for a given redshift, will be de- 
termined by the relative sensitivity on w of the two cluster 
count probes of dark-energy: the mass function growth and 
the survey volume up to a given depth. 

We perform a \ 2 analysis to determine the bias as a 
function of survey depth by comparing halo counts in red- 
shift shells as follows, 



E 



(n(w 



n ( z ) Nbody) 



rCO" 



(23) 



where n(wy' are the counts from the assumed self-similar 



mass function, for a cosmology with a given value of w, in- 
tegrated from a minimum mass M m in up to some maximum 
mass M m ax, for the redshift shell Zi of width Azi within 
which we can safely consider the mass function to be in- 
dependent of z. In turn, n(z) Nbody are the corresponding 
counts measured in simulations that use the fiducial cos- 
mology with w = — 1, and that are accurately described 
by the scaling ansatz, Eq. (|22[) . The associated error is as- 
sumed to be pure shot-noise given by the N-Body counts, 
o~ = ^/n Nbody Here we make estimates for full-sky surveys 
(i.e, we will draw optimistic forecasts for a given survey 
depth), although smaller areas can be easily incorporated in 
our analyses by scaling the shot-noise error accordingly. We 
shall consider surveys with SZ detected clusters, i.e, with a 
redshift independent mass threshold M ~ 10 14 h' 1 Mq, and 
constant photo-z error Az = 0.1. The upper limit in the 
mass is taken to be 10 15 ' 5 h~ x Mq to avoid possible system- 
atic departures of the N-Body fit used beyond this mass-cut 
with respect to the simulation measurements. However we 
have checked our results do not change if we take larger mass 
cuts. 

Figure [IT] shows the bias on w as a function of survey 
depth z for two different priors on self-similar mass func- 
tions, the ST fit and the MICE fit (i.e, assuming the fit at 
z = extrapolated to higher-z) . We find that the estimated 
bias is robust to better than 20% to changes in the SZ mass 
threshold M m in from 10 13 ' 8 to 10 14 ' 2 (see dashed lines in 
FigtHJ. For the ST fit, the bias can be as large as 50% for 
survey depth z ^ 1, whereas for the MICE self-similar fit 
it reduces to 20% at most for the same depth. The bias at 
low-z results from the relatively small and comparable sen- 
sitivities to changes in w of the geometric (volume) and the 
shape of the mass function growth. On the other hand, the 
strong bias at low-z for the ST prior is due to the poor fit it 
gives to N-Body for the relevant masses M > 10 14 ft^Mg. 
This systematic effect tends to decrease for increasing depth 
as the mass function growth becomes a much stronger func- 
tion of the dark-energy equation of state than the redshift 
shell volume (for constant Az) and thus it determines the 
cluster counts irrespective of the prior on the mass function. 
This trend is strengthen by the fact that the shot-noise er- 
ror per z-shell drops with depth as well (see bottom panel 
of FigtrTJ, so the high-z counts down- weight the observed 
low-z bias on w. Therefore, for deep surveys z ^ 1 any bias 
associated to the mass function tends to be washed out. 



10 DISCUSSION AND CONCLUSIONS 

The abundance of clusters as a function of cosmic time is a 
powerful probe of the growth of large scale structure. In par- 
ticular, clusters provide us with a test of the sensitivity of the 
growth of structure to cosmological parameters such as the 
dark-matter and dark-energy density content, its equation of 
state, or the amplitude of matter density fluctuations. More 
importantly, the abundance and clustering of clusters are 
complementary to other large scale structure probes such as 
the clustering of galaxies, weak lensing and supernovae (see 
e.g, Hu & Kravtsov 2002; Cunha 2009; Cunha et al. 2009; 
Estrada et al. 2009). 
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Figure 11. Bias on w induced by self-similar prior on the mass 
function. Upper panel: Assuming the self-similar ST (MICE) fit 
induces up to 50% (20%) shift in the recovered value of w for shal- 
low or medium depth surveys z 5i 1, whereas the effect tends to be 
negligible for deep surveys. Bottom panel: the relative shot-noise 
error decreases with redshift, what makes high-z cluster counts, 
which are dominated by the strong variation of the mass function 
with w, to down-weight biases induced by the low-z counts. 



In order to plan and optimally exploit the scientific 
return of upcoming astronomical surveys (e.g, DES, PAU, 
PanSTARRS, BOSS, WiggleZ, Euclid) one needs to make 
accurate forecasts of the sensitivity of these probes to cos- 
mological parameters in the presence of systematic effects. 
Accurate determination of the abundance of clusters is lim- 
ited by our knowledge of the relation between cluster mass 
and observable (see e.g, Hu 2003) . One key ingredient in this 
mass-observable calibration is the precise determination of 
the halo mass function. 

In this paper we use a new set of large simulations, 
including up to 2048 3 ~ 10 10 particles, called MICE (see 
Table [JJ to accurately determine the abundance of massive 
halos over 5 orders of magnitude in mass, M = (10 10 — 
10 15 ' 5 ) Mq, and describe its evolution with redshift. No- 
tice that the MICE simulations sample cosmological volumes 
up to 7.68 3 ~ 450 ft -3 Gpc 3 . Armed with these simulations 
we estimate, for the first time, the mass function over a 
volume more than one order of magnitude beyond the Hub- 
ble Volume Simulation size, 3 3 = 27/i _3 Gpc 3 (Jenkins et 
al. 2001; Evrard et al. 2002). Our results have been tested 
against mass resolution, choice of initial conditions, and sim- 
ulation global time-step size. 

Our findings can be summarized as follows: 

• We confirm previous studies (see Crocce et al. 2006) 
showing that an accurate determination of the mass function 



is sensitive to the way initial conditions are laid out. For a 
fixed starting redshift, the usage of 2LPT instead of the ZA 
to set-up the initial conditions helps to avoid the effects of 
transients that tend to artificially decrease the abundance of 
large-mass halos. Alternatively, one can use ZA with a high 
enough starting redshift. In particular, we find an under- 
estimate of the halo abundance by ~ 5% at M ~ 10 , 3.1 x 
10 14 and 10 14 h' 1 M Q at z = 0, 0.5 and 1 respectively (and 
larger for larger masses, at fixed redshift) if ZA Zi = 50 is 
used instead of 2LPT. All our mass functions were safe from 
transients or corrected accordingly for those simulations that 
need it (i.e MICE3072, see Table [TJ. 

• As highlighted by Warren et al. 2006, the mass of halos 
determined using the FoF algorithm suffer from a system- 
atic over-estimation due the statistical noise associated with 
sampling the halo density field with a finite number of parti- 
cles. The precise form of this correction has to be checked on 
a case by case basis. We confirm the results of (Warren et al. 
2006), particularly testing them for large halo masses, that 
the mass correction for the FoF(0.2) halos follows Eq. (|9}. 

• For masses M > 10 14 ft -1 Mq, we find an excess in the 
abundance of massive halos with respect to those from the 
Hubble volume simulation (Jenkins et al. 2001; Evrard et 
al. 2002), once corrected for the different cosmology used, 
that is a few times the Poisson error (see Fig[5]). As argued 
in Sec l4.ll we conclude that this difference can be largely ex- 
plained by systematics due to transients affecting the Hubble 
volume simulation. 

• From an extensive study of error estimates we con- 
clude that: (i) sample variance is significant only for ha- 
los of M < 10 14 /i -1 Mq, and dominates over shot-noise 
for box-sizes Lt ox < 1 h Gpc. Consistently, Poisson shot- 
noise errors under-estimate the total error budget by factors 
2 - 4 at 10 12-13 h~ l M in these volumes, (ii) Jack-knife re- 
sampling is in general consistent with external estimators 
such as ensemble averages, and theoretical errors. However 
it underestimates the total error budget for small box-sizes 
(< 500 h~ J Mpc) where sampling variance is more important 
due to the lack of long-wavelength modes and to the fact 
that jack-knife regions are not independent, (iii) For all our 
runs the theoretical error estimate in Eq. (|15|l is in remark- 
ably good agreement with our external sub-volumes estima- 
tor that incorporates fluctuations due to long-wavelength 
modes in addition to sampling and shot-noise variance at 
the scale of the given box-size. 

• Existing analytic fits (Sheth & Tormen 1999; Warren et 
al. 2006), accurately reproduce our N-body measurements 
at up to 10 14 Mq, but fail to reproduce the abundance of 
the most massive halos, underestimating the mass function 
by 10% (30%) at M = 3.16 x 10 14 h' 1 M (10 15 ft -1 M@). 

• The FoF halo mass function deviates from universal- 
ity (or self-similarity). In particular, the Sheth & Tormen 
(1999) fit, if extrapolated to z > assuming universality 
leads to an underestimation of the abundance of 30%, 20% 
and 15% at z = 0, 0.5, 1 for fixed v — S c /a w 3 (corre- 
sponding to M ~ 7 X 10 14 h' 1 M , 2.5 x 10 14 ft" 1 M Q and 
8 x 10 13 ft -1 M Q respectively, see Fig.[9]|. This is due to some 
extent to the ST not being a very good fit at z = 0. If we 
instead extrapolate our z — MICE best fit, we level of evo- 
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lution in the amplitude of the halo mass function is ~ 5% 
(10%) at z = 0.5 (z = l). 

• We provide a new analytic fit, Eq. ([22)) . that reproduces 
N-body measurements over more than 5 orders of magnitude 
in mass, and follows its redshift evolution up to z = 1, that 
is accurate to 2%. The new fit has the functional form of 
(Warren et al. 2006), but with different parameters to ac- 
count for the excess in the high mass tail (see also Table [2}. 

• Systematic effects in the abundance of clusters can 
strongly bias dark-energy estimates. We estimate that 
medium depth surveys z < 1, using SZ cluster detection, 
could potentially bias the estimated value of the dark-energy 
equation of state, w, by as much as 50%. This effect is how- 
ever an upper limit to the amplitude of this systematic effect, 
and it drops quickly with depth. For deep surveys z ~ 1.5, 
such as DES, the estimated bias is largely negligible. 



As discussed in Sec. Q] there are various ways to define 
halos and their masses, but the most commonly employed 
algorithms are Friends-of-Friends (FoF) and Spherical- 
Overdensities (SO). From an observer view-point it is not 
clear that a single definition is optimal for all kinds of de- 
tection techniques (see Voit 2005 for a review). X-rays 
observations are easier and more robust in the inner cen- 
ter of clusters, that have higher density contrasts and are 
more relaxed. Hence, it seems clear that spherical apertures 
are more appropriate in this case, although with thresh- 
old density contrasts as high as 500 or more (Vikhlinin et 
al. 2009). Optical richness is measured within regions of 
fixed projected physical radius, resembling a modified FoF 
method. Hence, they demand a proper calibration of the 
mass-richness relation (Bode et. al. 2001) if one does not 
want to rely on the correlation with the cluster X-ray proper- 
ties. Cluster detection through the Sunyaev-Zeldovich (SZ) 
effect are relatively recent in statistical terms and suffers 
from several other sources of error (e. g.Carlstrom et al. 
2002). Yet, both halo definitions have been employed in the 
literature. For instance, mock SZ maps have been built and 
exploited using FoF (Schulz & White 2003), and correla- 
tions between X-ray properties and the SZ signal were stud- 
ied starting from SO catalogues (Stanek et al. 2009b). 

Our particular work focused on halos defined through 
the FoF percolation method leaving the SO for a follow- 
up study. Nonetheless we have tested that the trend seen 
towards high masses when comparing different MICE runs 
persist for decreasing linking length, that is, when probing 
the inner most regions of the halos. 

We also leave for future work a more thorough inves- 
tigation of the cosmological implications of our results. In 
particular, it remains to be seen what is the impact of the 
bias on cosmological parameters induced by mass function 
priors in the presence of other systematic effects such as the 
uncertainties in the cluster mass- observable relations and 
the cosmological parameter degeneracies present in cluster 
abundance measurements. 
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